A complete beginner's guide to Snowpark in dbt from Coalesce 2023

A Snowflake Solutions Architect at Aimpoint Digital shares what Snowpark is, how it works, and how to use it in dbt.

"It's an execution framework…It's not a specific syntax. It's just a way of querying data that's not SQL."

Christopher Marland, Snowflake Solutions Architect at Aimpoint Digital, shares what Snowpark is, how it works, and how to use it in dbt.

Snowpark provides a method for querying data in Snowflake using languages other than SQL

Snowpark, as explained by Christopher, is a set of runtimes and libraries that allow developers to interact with Snowflake using languages other than SQL.

He explains, "It's an execution framework…It's not a specific syntax. It's just a way of querying data that's not SQL." He also clarifies that Snowpark provides the tools and environments necessary for developers to use different programming languages, like Python, to interact with Snowflake.

Python models offer more flexibility in data manipulation and transformation

Python models in dbt allow developers to execute Python code in their data platform. If that data platform is Snowflake, developers use the Snowpark execution framework. The flexibility of Python models is emphasized by Christopher as he demonstrates how Python models can be used for complex data transformation tasks that would be difficult to achieve with SQL models.

Christopher explains that Python models in dbt are especially useful when dealing with complex data transformations that require a higher degree of flexibility and dynamic functionality. Operations that would be difficult or impossible to perform using SQL can be achieved using Python, making it a valuable tool in a data engineer's toolkit.

Python models in Snowpark should be used sparingly and for specific use cases

Despite the benefits of Python models, Chistoper advises that they should only make up a small percentage of a project. "Only about 5% of your project should be in Python at the absolute maximum," Christopher states. He explains this is due to the added complexity and cost of having data engineers who are also Python specialists.

Christopher’s key insights

  • Snowpark is a set of runtimes and libraries that allow people to develop on top of Snowflake in languages other than SQL
  • Python models in dbt execute Python code in your data platform. If that data platform is Snowflake, then Snowpark is used
  • Python models should be used for complex use cases that cannot be handled by SQL. These should make up no more than 5% of your project
  • Snowpark can be used for geospatial analysis, which would be difficult to do in SQL
  • When using Snowpark with dbt, a stored procedure is created and run to produce a table
Related Articles

Register for Coalesce 2024

Join us in-person or online for the largest analytics engineering conference. Level-up your skillset, expand your network, and build your path at Coalesce 2024.